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A self-describing 
database system uses 
an active and 
integrated data 
dictionary to provide 
metadata to systems 
and users. The data 
dictionary system uses 
the services of the 
database system to 
manage metadata. 



A database system should support a 
rich variety of metadata describ- 
ing and controlling the manage- 
ment and useof data. Feu- database systems 
today provide even rudimentary integrated 
metadata management facilities. 

This article presents a self-describing 
database system. Its active and integrated 
data dictionary system provides the only 
source of metadata to users, programs, 
and the database system, and uses the ser- 
vices offered by the database system for 
metadata management. We focus on the 
design of a self-describing metaschema 
and a formalism for specifying some of the 
operations controlling the evolution of 
database schemata. 



Architecture for self- 
describing database 
systems 

A self-describing database system and 
its environment are illustrated in Figure 
I. u The data mapping control system, or 
DMCS, supports and enforces two or- 
thogonal dimensions of data description, 
the point -of-view. dimension and the 
intension-extension dimension. 

The point-of-view dimension has three 
leveb of data description: the information 
meaning described in the conceptual 
schema* the external data representations 
described in external schemata, and the in- 
ternal physical data structure layout 
described in the internal schema. These 



three levels of data description result in." 
databases that are flexible and adaptable 
to changes in the way users view the data 
and in the way data is stored. This com- 
bination of flexibility and adaptability is 
usually called data-independence. 3 

The intension-extension dimension has 
four levels of data description: 

(1) the information about the data 
model supported by the database 
system described in the meta- 
schema, 

(2) the information about the manage- 
ment and use of data described in 
the data dictionary schema, 

(3) the information about specific ap- 
plications described in the applica- 
tion schemata, and 

(4) the application data. 

Each level of data description in the 
intension-extension dimension is the in- 
tension of the succeeding data description 
and the extension of the preceding data 
description. An intension completely 
describes and controls changes of an ex- 
tension. All the levels of the intension- 
extension dimension are described in 
terms of the same data model. A descrip- 
tion of the metaschema is explicitly stored 
as part of its own extension— it is self- 
describing. The stored metaschema can be 
retrieved, but it cannot be changed; it is 
self-describing, not self-destructing. 

The DMCS can be thought of as a 
DBMS stripped to the bones on the one 
hand, but still a complete DBMS on the 
other hand. It supports the set of elemen- 
tary functions essential to the maintenance 
of the two di m e n sions of data description. 
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The data language (DL) interface is the 
data manipulation language for the data 
model. Because the metaschema is self- 
describing, all data, including descrip- 
tions', is defined, retrieved, and manipu- 
lated through the DL interface. No data 
definition language is needed. 
- The data management I ool box contains 
software that is plug-compatible with the 
DMCS through the DL interface. A data 
management tool supports high-level 
functions that, although important, are 
not essential to data management. To pro-' 
duce a plug-compatible data management 
tool, information about the data model . 
must be retrievable through the DL inter- 
face. This is exactly why the metaschema is 
explicitly stored. Each of these tools will 
have its own user interface, but each must 
interface with the DMCS through the DL 
interface. A database administrator would 
develop or buy off-the-shelf data manage- 
ment tools, such as schema design aids, 
.software documentation packages, high- 
level query language processors, report 
generators, etc., to supplement the 
elementary functions ..ipported by the. 
DMCS. He or she would develop or buy 
off-the-shelf definitions of the data dic- 
tionary schema and definitions of applica- 
tion schemata for classes of applications. 

The internal data language (i-DL) inter- 
face is the interface through which ail data 
is passed from the DMCS to the operating 
system supporting the DMCS. 

The two orthogonal dimensions of data 
description supported by the DMCS, the 
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figure 1. Self -describing database system environment. 



point-of-view and the intension-extension 
dimension, are illustrated in Figure 2. 

Any system in this new architecture is 
born with data structures to hold the 
metaschema extension— the data dic- 
tionary schema. Initially, the data dic- 
tionary schema consists of the stored 
description of the metaschema. The data 
dictionary schema can subsequently be ex- 
panded into a full data dictionary schema 
describing and controlling the manage-' 
mem and use of application databases. 
Initially, the extension of the data dic- 
tionary schema, the data dictionary data. ; 
consists of an empty set of tables! 

The architecture support s software 
plug-compatibility and data plug-com- 
patibility, which gives the database ad- 
ministrator the freedom to choose the' 
tools for data management and define the 



data management strategy best suited for 
' the enterprise. " . 

The architecture unifies well-estab- 
lished principles and current trends in the 
areas of database systems and data dic- 
tionary systems. It is a generalization of 
the ANSI /SPARC DBMS Framework, 
which only considers the point-of-view 
dimension, and comprises recent ideas on 
the in tension -extension dimension from 
the International Standards Organization 
Working Group (ISO/TC?7/SC5/VVG3) 
on the conceptual schema and in forma- 
; lion base It supplements data model scan- 
dardi/ation i effort's and supplements data ' 
dictionary system standardization efforts. 
1 The architecture is based on the notion 
of self -describing database systems and 
has matured through several discussions in 
the ANSI/SPARC Database Architecture 
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figure 2. Point-of-view dimension (a) and Intension-extension dimension (b). 
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Framework Task Croup, DAFTG. The ar- 
chitecture has been accepted by ANSI/ 
SPARCandbbengconsklcredbylSOasa 
reference model for database systems in ihe 
late WW's and the lWs. 

The metaschema 

We use the relational data model as an 
- 4 However, we use a graphic f or- 
inspired by the object-role data 
l M The symbols in Figure 3 repre- 
sent tbedefmnloo of a relation with name 
rand two anrftmtes with names al and al 
the attributes are defined oyer a nonJex- 
leal domain with named I and a lexical do- 



: inc cnapcs m reiauonai scocmata are 
■ definitions of ressxious, domains, and s?* 
tributes. TTie entity names used in these 
d fffiili o n s ait relation names, domain 
tames, andatnftbteiiames, Aocordingiy, 

ii a med r f et a t i on , .do m ai n , and at- 
: . and the lesdcal^domahts named 
Home, and attrib- . 
The three unary relations 



named relation, domain, and attribute 
define sets of entities in existence. The re- 
lation named rein defines relationships be- 
tween existing relations and their relation- 
name. The relation named attn defines 
relationships between attributes and 
attribute-names. The relation named 
domn defines relationships between do- 
mains and their domain-names and lex- 
icality. Finally, the relation named rdas 
defines the relationships between rela- 
tions, domains, and attributes. The keys, 
indicated by double headed arrows, point 
out attributes, the values of which unique- 
ly identify tuples in the relation. Note that 
no two relations can have the same name 
and no relation has more than one name. 
The same restriction a pp lies* to- domains 
and their names. In rdas we indicate that 
an attribute is a unique entity related to at 
most one domain and relation. 

There are several more constraints in- 
volved: the relations relation, domain, 
and attribute model sets of eotitfc* in ex- 
i s tence . Therefore, reinitiation] S rcla- 
tionjrciation] and conespomfing rules ap- 
pry for domains and attributes. On the 
other hand, we insist dm all relations have 



names, so actually rein [relation] = rda- 
tionlrdationj. Also, attribute names must 
be unique within relations. 

These rules, together with the keys and 
•several other rules, will aQ be specified as 
part of the operations defined in the 
. metaschema. 
, Boxes represent metaschema relations. ' 
Full circles represent nontexkal domains 
of surrogates used to model entities. 
Broken circles represent lexical domains 
used to model entity-names. Arrows rep- 
resent keys. The metaschema is so far self- 
describing. It is defined in terms of rda 
ticro, domains, and attributes, and its 
definition can be stored in the database it 
defines. See Figure 5. (We have omitted 
the unary relations relation, domain, and 
attribute.) 

Operations in the core 
metaschema 

To make sure that the metaschema com- 
pletely models and controls all operations 
on its extension (the data dictionary 
schema), we define an insert, delete, and 
modify operation for each relation in the 
metaschema. From the database ad- 
ministrator's point of view these opera- 
tions are elementary, but as we shall see in 
their specification below, each involves 
several implied operations on update- 
dependent relations. All the operations we 
define must be explicitly represented in the 
metaschema, otherwise the metaschema 
does not completely model and control all 
operations on its extension. For the 
metaschema to be self-describing, it must 
explicitly model the notion of an operation 
and control all operations on the specifica- 
tion of operations. Wedonotconsiderthis 
expansion of the core metaschema here. 
For a detailed description, see Mark. 9 
Operations are specified in the language 



Syntax. Each operation is defined by a 
set of update dependencies, each with the 
following form: 

<opl> — > <cond>,<op2> 

<opn>. 

where <opl> is the operation being 
defined: <opi>,i - 2... „n, is either an 
implied operation or an implied primitive 
operation; and <cond> is a condition on 
the database state. 

An operation <opi> has one of the 
following forms: 
Insert^ < relation _name> 

(< tuple-spec >)) 
dekte( < relation . name > 

(<tuple-spec>)) 
modirX< relation. name> 

(<tupfe_spec>)), 
< relation, name >(< tuple. 
spee>)) 
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Figure 5. Metaschema description stored in metaschema extension. 



where the < tuple speo is a tuple variable 
for the relation with the name < relation 
name> and consists of a list of < domain 
variable >s. The < tuple speo in 
<opl> is the formal parameter for 
<opl>. All the <domain variable>sin 
the < tuple speo of <opl> are as- 
sumed to be universally quantified; All 

< domain variable >s in the < tuple 
spec > s of < opi > , not bound to a univer- 
sally quantified < domain variable > in 
<opl > , are assumed to be existentially 
quantified: All <domain variable>s are 
in caps; nothing else is. 

The implied primitive operations are 
assert for adding a new tuple in a relation, 
retract 1 for eliminating one, write and read 
for retrieving data from the user, new for 
creating a unique new surrogate, and 
break for temporarily stopping the system 
to do some retrieval before giving control 
back to the system. The implied primitive 
operations <opi> have the following 
forms: 

assert(< relation name >(< tuple 
spec>)) 

retract(< relation name >(< tuple 

spec») 
write("<any text>"), or write 

(< domain variable > ) 
read( < domain variable > ) 
new« relation name> 

(< tuple speo )) 
break 

The < relation name> used in the 
operation new must be the name of a 
unary relation defined over a nonlexjcal 
domain. The conditions < cond > are ex- 
pressions of predicates with the form 
<relation name>(<tuple speo). The 
connectives used in forming the expres- 
sions are a (and) and -i (negation). In ad- 
dition, we use the primitive predicates 
nonvar and var to decide whether or not a 

< domain variable > has been instan- 
tiated. 

Conditions can also be used by the user 
to retrieve data from the system. 



Semantics. An operation succeeds if, 
for at least one of its update dependen- 
cies, the condition evaluates to true and 
all the implied operations succeed. It fails 
otherwise. 

When an operation is invoked, its for- 
mal parameters are bound to the actual 
parameters. The scope of a variable is one 
update dependency. Existentially quanti- 
fied variables are bound to values selected 
by the database, system or to values supf 
plied by the user upon request from the' 
database system. Evaluation of condP 
lions, replacement of implied operations, 
and execution of implied primitive opera- 
tions are left-to-right and depth-first for 
each invoked update dependency/For the 
evaluation of conditions we assume a 
closed world interpretation . 1 1 

The nondeterministic choice "of a re- 
placement for an implied operation is 
done by backtracking, selecting in order of 
appearance the update dependencies with 
matching left-hand sides. If no match is 
found, the operation fails. 

An implied operation matches the left- 
hand side of an update dependency if 

• the operation names are the same, and 

• the relation names are the same, and 

• all the domain components match. 
Domain components match if they are the 
same constant or if one or both of them is a 
variable. If a variable matches a constant, 
it is instantiated to thai value. If two 
variables match, they share value. 

The semantics of the primitive opera- 
tions are 

assert(r(t)) 
Its effect is r : = r U |t ) . It always suc- 
ceeds. All components of t are constants. 

retract(r(t)) 
ltseffectisr: = r\ (t) where all compo- 
nents of t are constants. It always suc- 
ceeds. 

writeCtext") 
It writes the •text" on the user's screen. It 
always succeeds. 

write(X) 



It writes the value of X on the user's 
screen. It always succeeds. 
read(X) 

It reads the value supplied by the user and 
binds it to X. It always succeeds (if the user 
answers). 
new(r(D)) 

It produces a hew unique surrogate 7 , from 
the nonlexical domain over which r is' ' 
■ defined and binds the value of the variable . • 
. D to this surrogate. It always succeeds. v , • 
* X break:, "v • 
It suspends the current execution and .. 
makes a new copy of the interpreter avail- 
able to the user, who can use it to retrieve 
the information needed to answer a ques- 
tion from an operation. 

We kept the list of primitive operations 
minimal to illustrate the concept. It can 
easily be extended. We should emphasize 
that the user cannot directly invoke 
primitive operations. 

The execution of assert and retract 
operations done by the system in an at- 
tempt to make an operation succeed will 
be undone in reverse order during 
backtracking. This implies that an opera- 
tion that fails will leave the database un- 
changed. 

The operations, We will only specify the 
few operations shown in the table in 
Figure 6 to illustrate the principles. For a 
detailed discussion, see Mark. * 
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figure 7. 
Ifilo relation. 



tnMrt(rtU*aoa(R)) 

■ew(r*Utk»(R)), 

insert (ralitkn(R)). 
- no»»*r(R) a rtlaiion(R). 
•* BoaTir(R) a -^rtUtioa{R)) 

»«trt(r«Utiom(R)), 

mautfotnUR)). 



Figures, 
loioreta. 



inaert(reln(N > R)) , 1 

- *ar(R), 
o«w(r«Utbn(R)) f 
in*ert(r*b(N,R)). 

- w(N), 
wriwCYeUtion uim? < ) i 
break, 

r«d(N), 

inMrt(rtJn(N,R)). 

- Donvar(N) a bmw(R) a rtln(N.R). 

- noavax(N) a noovar(R) a ->(r*ln(N_)) a -(«1b(_R)), 
iwtrt(reln(N,R)), 

in»trl(rcUlion(R)). 



uu«ft(«Uhb«U(A)) 

new(*ttribole(A)) ( 

sesvv(A) a attribate(A). 
- noav*r(A) A -(«»trib«U»(A)j, 
Mttft(ftUrib«tt(A)). . 
in~rt(UU(Aj), 
iatert(rdt«f. M Al. 



Figure 9. 



nto attribute. 



If the variable R in the operation 
insert(retation(R)) in Figure 7 is uninstan- 
tiated when the insertion into relation rela- 
tion is caDed, the system produces a new 
surrogate and proceeds with the insertion. 
If R is instantiated, two possibilities exist: 
R is already in relation relation, in which 
case the insertion succeeds with the 
database state unchanged; or R is not in 
relatkra relation, in which case wcinsertit. 
Since all relations must have some at- 
tributes and • name, we propagate by trig- 
gering insertions into the relations rdas 
and rein of the tuples (R,_,„) and UR), 
respectively, indicating that the relation 
surrogate is at this point the only thing we 
know. 

In the operation insert(reta(N,R)), in 
Figure 8, the first two rules produce or re- 
quest from the user any umnstantiatcd 
variable values. The insertion operation 
succeeds with the database state tin- 
I if a relation with that particular 
5 b already m the database. If noreia- 



lion with the name N exists and the rela- 
tion represented by surrogate R does not 
already have a name, we assert the tuple 
and propagate by inserting R in relation 
relation. 

In the operation insert(attribute(A)) in 
Figure 9, the fust update dependency pro- 
duces a new attribute surrogate if needed. 
The second succeeds if the surrogate is 
already present. The third makes the asser- 
tion and propagates through insertions in 
attn and rdas. 

In the operation insert(rdas(R,D,A)) in 
Figure 10, a new attribute surrogate is pro- 
duced if needed. If the attribute is not 
already in rdas and relation and domain 
surrogates are not provided, they are re- 
quested. If attribute name uniqueness 
within relations is not about to be violated, 
then the tuple is asserted in rdas. This may 
cause propagation to relations relation, 
domain, and finally attribute. 

The relation rdas is central to the core 
rnetaschema. Since we want domains to 
exist without being currently used in any 
relations, the only direct propagation 
from deleting tuples from rdas is to at- 
tribute and sometimes to relation. 

The definition of the compound update 
operation deiete(rdas(R,D,A)) in Figure 
I l is very special, because it allows the user 
free use of any combination . of unistan- 
tiatcd variables, possibly resulting in 
multiple tuple deletions. The multiple 
deletions result from extensive use of 
recursion, and every single deletion b 
■automatically propagated during the 
process. 

If no tuples matching the actual param- 
eter are presemm rdas, trie operation suc- 



ceeds with the database unchanged. If aU 
variables are uninstantiated, we don't 
allow any change. The operation succeeds 
with nothing done to the database. The 
possible combination of instantiated/un- 
instanttated parameter combinations left 
are illustrated in Figure 12. 

. Because the attribute uniquely identifies 
one tuple, one (domain, relation), all oper- 
ations with an attribute surrogate cause 
one. tuple deletion from rdas, which is 
propagated to attribute. if the attribute to 
be deleted is the last one in a relation, then 
the deletion also propagates to relation. If 
only the domain surrogate* is given, 
rdas(_,d,_), then we must delete all uses of 
that domain in any relation. If only the 
relation surrogate is given, rdas(r,_,_), 
then all attributes for that relation are 
deleted from rdas, implying of course the 
deletion of the relation too. Finally, if both 
relation and domain surrogates are given, 
rdas(r,d,_), then we must delete all uses of 
the given domain in the given relation. 

If the attribute is not present in at- 
tribute, the operation de!ete(attribute(A)) 
succeeds with the database unchanged. 
See Figure 13. In the last update depen- 
dency,, if no value is given, the user is * 
prompted for one and the operation is ■ 
tried again. In the second update depen- 
dency, if a value is given and that value is in 
attribute, we remove it and propagate by 
deleting all its names from attn and all its 
uses from rdas. 

It takes only simple arguments to see. 
that if a deletion of an attribute is caused 
by a previous deletion from rdas, then the 
propagation to rdas from this update 
dependency immediately returns with suc- 
cess, because that attribute is not used in 
rdas anymore (by the very first update 
dependency). 

If, on the other hand, this deletion from 
attribute is the original one, the propaga- 
tion back to attribute following the propa- 
gation back to rdas will succeed by the first 
update dependency. 

The operation delete(relation(R)) is 
shown in Figure 14. If a surrogate for the 
relation is given and the relation exists, it is 
removed, its name is deleted, and all at- 
tribute and domain relationships to the 
relation are deleted from rdas. 

In the operation deletc(rem(N,A)) in 
Figure 15, only the relation surrogate or 
the name is needed to uniquely identify a 
tuple to be deleted, so there is no internal 
recursion in this rule. The propagation of 
the deletion to relation succeeds with the 
job done, and the returning call of a dele- 
tion from rein stops on the first rule. 

In a ftuTy expanded rnetaschema there 
will be several more relations and opera- 
tions modeling and cont rolling all opera* 
pops on the specification of operat ionS i ^ 



30 



COMPUTER 



m*ert(rrUi(R t D t A)) 

- noaw(A) A n)n(^A). 

m»trt(nUi(RJ> t A)). 

- vftrfR) a -(Bonw(A) a rdu(^A)), 
wriiafYelation raifogateT"), 

; bnak, ntd(R}, 

inKzi(fdM(R t D v A)). 
«• " ▼tf(0) a ^(bo&vw(A) a rduU^A)), 

wriU( "domain ■wTogftte?"), 

bnik,r«»d{D), 

Wt(rd«(RAA)). 

- noaVu(A) a nonv«r(R) a nonw(D) a i(rdn(^A)) a 
-^tdM(R^B) a ftUn(A,N) a »Un(B,N) * ^(A=B)), 
m*«t(rdft«(R,D t A)), 

m*ert(raUtion(R)), 
iBMii(domun(D)), 
in»eit (attribute! A)). 



dtltta(nUft(RJD,A)) 

vat(A) a vfti(D) a w(R) ( 
write(Volhinf d<m«"). ' 

nonw{A) a rd«(RJ>,A) a rdfti(R^B) a ifAaB), 
r*tr*ct(rd M (R t D,A)) l 
dtkU(attribuU(A)). . 

noavarfA) A ldftt(R,D,A) A -^fdtf(R^B) a -»(A=B)), 
wtrftct(rdfti(R,D,A)), , , . 
dtl^(«Uribut«(A)), \ *' 
dekW(r»lUlon(R)). / 
nonvtr(D) * vw(A) A t»(R) a rd M (R t D^) l 
dekutrfiitRJ)^)), 

dekto{idkUDJ)- ' 

Bonyw(R) a Vtr(A) A Var(D) a rd*i(R,D,A), 
dekto(idft»(RJ>,A)), 
d«fcU(tdti(!Cj). 

noavar(R) a noavar(D) A var(A) a rd*t(R,D»A), 
dtkW(idftt(R4>,A)), 



Figure 10. Insertion into rdas. 



figure 11. Deletion from rdas. 



; ids* ( R, D, A) 

T • ft 

- d ft 

.r d ft 

d 

•v ■ r d 



figure 12. Possible Instan- 

Ipmrameter 



delete(«ttribut«(A)) 

- -t(ftttribota(A)). 

- nonv»r(A) a attribut«(A), 
wtr*ct(ftUribut«(A)) t 
defeW(rd«U-«A)), . 
,deleU(ftttn(A,4). 

- w(A), ; 

write ("ftttribuU roirogftto?''), 

break, 

wad(A), 

dekto(attribate(A)). 



Figure 13. Deletion from attribute. 



delete(r«lation(R)) 

- -(wU»ion(R)). 

- Tftr(R) ( 
writafVelation surrogate?"), 
break, 

iwd(R), 

debtfl(nlftlnn(R)). 

- nonvar(R) a rtlation(R), 
relrftCt(reUtioo(R)), 
delete(reto(^R)) ( 
dektorrdtaflUJ). 



figure 14. Deletion from relation. 



Farther down the 

intension-extension 

dimension 

We need to control schema definition 
from both the metaschema level and the 
data dictionary schema level. Hie first 
question is therefore what the initial con- 
tent of the intension-extension dimension 
is. the second question is how we set up 
the initial content of the intension- 
extension dimension. 

As we shall see, the initial content of the 
data dictionary schema includes only the 
bare, minimum needed to control; the 
definition and change of application 
schemata*' A data dictionary schema mi fftf 
do much more than that. The third ques- 
tion therefore is how the database ad- 
minfoUHtor chooses, defines, and enforces 
a ^ fl t fl^fl ^f manag e mci it strategy. 
> The relations and operations in the 
mftaschenia define and control operations 
only on the immediate extension of the 
met asc he ma. They define and control 
only intralevd propagation of a schema 
I rules and laws 



can be identified, which is difficult, the 
metaschema and the data dictionary 
schema should also define and control in- 
terlevd propagation caused by changing 
an extension interpreted as intension for 
the next level. The fourth question 
therefore is how we specify and enforce in- 
tcrlevel propagation. 

These four questions will be addressed 
in the following short sections. 



t. We need to control the 
definition and change of relations and 
operations from both the m et asc he ma and 
the data dictionary schema. In the first 
case, we must control the definition and 
change of the data dictionary schema. In 
the second case, we must control the 
definition and change of the application 



The relations and operations defined in 
the previous section are not level specific 
and we can use them at any level where we 
need to control the definition and change 
of schemata at the next level. We therefore 
. choose to include the relations and opera- 
tions defined previously in both the 
metaschema and the data dictionary 



schema, also in accordance with our prin- 
ciple of self-description. 

On the other hand, we do need to be 
level specific when we call and execute an 
operation on some relation. We choose to 
identify the level of data description at 
which a level-specific operation is defined 
by prefixing operation names with an m 
for metaschema, a dd for data dictionary 
schema, and an s for application schema. 

A set of level-specific metaschema 
operations can be defined as shown in 



d«lttft(i«la(N^)) 

- A(ftln(N3))- 

- *ft?{N) a vsrfR), 
wTiu(VtUUon Bam*?*), 
few*, 

rtad(N), 

dftWtaH»(tU)> 

- -(*ar(N) a vtr(R)) a rthi(N3). 
retnct(rala(N,R)), 
d«k*fr«htkm(Rn. 



figure 15. Deletion from rem. 
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- cl, 

dekte(retefN.R)). 



Figure 16. Metaschema level -specific 
deletion from rein. 



t(refo(N,R)) 
ina«rt(reb(N,R)). 



Figure 17. Metaschema kveJ-spetifk In- 
sertion into rem. 



Figures 1 6 and 17. The purpose of the con- 
ditions cl. c2, ... is to protect from 
changes that part of the data dictionary 
schema thai has the twofold roteof being a 
stored description of the mctaschema and 
an integral pan of the data- dictionary 
schema controlling application schema 
definition and change. 

As a simple example of how the 
m-operations protect the stored descrip- 
tion of the metaschema from modifica- 
tions, consider the condition cl in the 
operation specification in Figure 16. It 
must include the following: 
cl = ^(N = namcl)A-» 
(N = name2) A ... A -»(N = namen). 

where name I namen are the names of 

the metaschema relations. 

We choose to let the initial content of 
. the data dictionary data be an empty set of 
relations ready to hold the extension of the 
initial content of the data dictionary 
schema. 

Finally, we choose not to include any 
initial content of application data. 

Our choices for the initial content of the 
intension-extension dimension can be 
summarized as in Figure 18. 

The mctaschema consists of the meta- 
schema relations and the operations de- 
fined previously, and level-specific meta- 
schema operations as those defined in 
Figures 16 and 17. The metaschema itself 
is imaginary, meaning that nothing is ac- 
tually stored in the dotted tine box. We will 
later explain how the relation definitions 
of the metaschema itself are hard-wired in 
the DL-processor. 

A description of the metaschema is ex- 
plicitly stored in the black box of the data 
dictionary schema. Nothing fin the black 
box can be changed by operations defined 
in the raetaschema—the metaschema is 
self-describing, not self-destructing. The 
conditions in the level-specific meta- 
schema operations prevent changes of the 
stored description of the metaschema. 
Nothing but the description of the meta- 



schema is part of the initial content of the 
data dictionary schema. 
- The initial content of the data diction- 
ary data is an empty set of relation tables 
ready to store the extension of the initial 
data dictionary schema. 

' Setting op the Initial tateiisioo~exten~ 
son dimensioj. In order to explain how 
we set up the initial content of the 
intension -extension dimension, we must 
first make some assumptions about the 
DL -processor. 

- • ~ / * 
The DL processor. The DL pro- 
cessor — the DMCS — has the definition of 
the metaschema built in. This basically 
means that it knows the names of the 
metaschema relations, how they are struc- 
tured, and in which files their extensions 
are stored. The DL processor does not 
have the definition of the metaschema 
operations built in. Instead, it has a search 
module that retrieves an operation specifi- 
cation given an operation call. It locates 
m -operations from the data dictionary 
schema, dd -operations from the data dic- 
tionary schema, and s -operations from the 
data dictionary data. 

< The execution module executes non- 
level-specific operations relative to the 
level specific operations that called them. 
If the level cannot be decided from the 
call, the operation fails. 

The DL processor enforces the opera- 
tional semantics of the update dependency 
formalism. As part of this, the specifica- 
tion of primitive operations is built in. 

The setup. To set up the initial content 
of the intension-extension dimension, we 
replace the search module of the DL pro- 
cessor by a booster module. The booster 
module searches an externally stored set of 
level-specific metaschema operation 
specifications to retrieve an operation 
specification, given an operation call. This 
externally stored set of level-specific 
metaschema operations differs from the 
set we want to store in one sense: the 
operations do not have the conditions cl , 
c2, ... , cn defined above. 

We now issue the series of m -operation 
calls, resulting in the insertion of the 
metaschema description as part of the data 



Finally, we replace the booster module 
by the search module, and we are in 
business. 

Note that an insertion into the relation 
relation at any level creates an empty table 
at the next level to hold the extension of 
the inserted relation definition. This is a 
general rule for intertevd propagation. 

Expanding the data dictionary schema. 
The data dictionary schema must define 
and control aD operations on data used for 



database management. Most importantly, 
the data dictionary schema must control 
the definition and change of application 
schemata, as discussed. But, in addition to 
that, the data dictionary schema must de- ' 
fine and control the notions of authoriza- 
tion, user, schema, program, file, distribu- 
tion,; etc. \„ 

As the database administrator decides f 
on a database management strategy! he or. 
she will define relations and operations in 
the data dictionary schema to enforce this : 
strategy, and the part of the data dic- 
tionary schema copied from .the meta- 
schema will gradually be expanded into a ' 
full data dictionary. This means that the ' 
level-specific operations defined in the 
data dictionary schema will be more com- 
plicated than those for the metaschema. 
As the data dictionary schema expands, 
the level-specific operations in the data 
dictionary schema must controrthe propa- 
gation of changes of application schemata 
to changes of data dictionary data con- 
trolled by the expanded data dictionary 
schema. . . . ' . 

, In an expanded data dictionary schema, 
the level-specific. operations are con- 
structed , from the original metaschema ; 
, operations, as illustrated in Figure 1 9. f 
, The conditions cl, c2, ... . cn, can test 
data controlled by the expansion of the - 
data dictionary schema and thereby 
distinguish alternatives for propagating 
changes of data, controlled by the initial 
data dictionary schema,, into changes of 
data, controlled by t he f ull data dictionary 
schema. The alternative sequences of im- 
plied operations on the data, controlled by 
th : expanded data dictionary schema, are 
inserted in the operation specification in 
Figure 19, as indicated by the dots. 

Note that the database administrator 
must use the original non-level-specific in- 
sert, delete, and modify operations when 
defining the level-specific data dictionary 
schema operations that constitute the user 
interface for the database designer. 

It is important to realize that the only 
part of the data dictionary schema that 
cannot be changed in any way through the 
metaschema is the initial pan controlling 
relation definition and change. This 
means that the database administrator can 
design the database management applica- 
tion to suit the particular needs of the 
enterprise, the same way a database 
designer designs an ordinary application. 
If the database administrator prefers a 
particular kind of authorization pro- 
cedure, then he or she can include it. 
Likewise, if the database needs to be dis- 
tributed, the distribution mode) can be 
defined appropriately. 

In summary, the database administrator 
decides on the database management 
strategy rather than having to suffer with 
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an inadequate strategy forced by the 
database system vendor. Or, since the 
metaschema is explicitly stored, several in- 
dependent software houses may offer off- 
the-shelf, plug-compatible database 
management strategies (that is, data dic- 
tionary schema definitions) that the 
database administrator can choose from. 
Such plug-compatible data dictionary 
schema, definitions could only be pro- 
duced by the original vendor if the 
metaschema was not explicitly described. 

; ' Our database system framework sup- 
ports the notions of plug-compatible data 
and plug-compatible software. Therefore, 

■ we can strip the DMCS to the bones, leav- 
ing only the DMCS functions, which are 
absolutely essential. We know of no other 
database system or data dictionary system 
bom this. naked: they aO come with more 
of the database management strategy built 
in, such as an authorization strategy, and 
with built-in data management tools that 
are nice and important, but not essential. 

In summary, the simplicity and poten- 
tial of our framework is based on 

• the explicitly stored metaschema de- 
scription that gives the plug-compatibility; 

• the explicitly stored and changeable 
data dictionary schema that allows us to 
design our own database management 
strategy; and 

• the power of the update dependency 
formalism that allows us to fully follow 
the 100 percent principle, which means 
that an intension completely controls its 
extension and thereby relieves the DL pro- 
cessor from enforcing a lot of special rules. 

The following simple example illus- 
trates how the database adminisuator may 
expand the data dictionary schema. 

Example. The database administrator 
can define a simple authorization strategy 
for all application database users by defin- 
ing the relation and operations in the data 
dictionary schema using the m-operations, 
as shown in Figure 20. 

When the database designer defines an 
application schema relation and some 
operations on it, they will look like Figure 
21. (We assume that the DL processor 
knows the usemame U.) 

When the database designer has defined 
the operations using the dd -operations, he 
inserts tuples in the relation authorized, 
allowing the chosen users to do the chosen 
operations. 

When user u I calls an operation, such as 
s_insert(supplier(. .)), it will succeed only if 
the tuple (u 1 .suppliers Jnsert) is stored in 
the relation authorized. 

This was the simple case, where the 
database administrator trusts that the 
database designer will remember to insert 
the condition on the relation authorized in 
all update dependencies of all wjperanons 
defined. If the database administrator 



data dictionary 
schema 



data dictionary 
data 



application 




Figure 18. Initial content 
of the Intension-extension 



d<Ld6lete(r«ln(N t R)) 

- Cli 

detete(rem(N,R)), 



c2, n 

dekt«(reln(N,R)), 



dekt«(rem(N t R)) l 



yr: ■ v.- ; 



Figure 19. Data dictionary schema levd- 
spedfk deletion from rein. ■ 



authoriied 



diUn«rt(authcrtiied(U,R,0)) 
I i^efttaathoriiedfU^OH. 



d<Ldcleto(aathoriitd(U,R l O)) 
retract(aqthorii«dtU.R.Oli. 



Figure 20. Authorization described In the data dictionary schema. 



anfipBg 
I a# I a_name I city 



aJnaert(snpptier(S,N t C)) 

* anthofbtd(U^ppIitr,aJnttrt), 



a^!*lBt«(rappntr(S.N,C)) 

- aothoriaadfU^nppQarAJettU), 

y^rac^inpnuerfS^^]. 



Figure 21. An example of an appflcatton 
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aJeleU(p«r«ori(P,M)) 
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Figure 22. 



d«bte(reln(N,R)) 

- r«ln(N,R) Aop.jpecfO.R) a opn(0,M) a -^(M=d<Ldelet«) a -.(M^=ajdelete)), 
; writafDelete all tuples in"), 
wriu(N), 

wriw("utug operation"), * 

write(M), 

break, 

retract(reln(N,R)), 
dtkte(ralation(R)j. 



figure 23. Instroctmg oser about interim! propagation. 



wanutornakesureofmis,heorshcmust 
define the do^operatiora on the relation 
named condition to force all conditions 
of s-opcrations to include authorized 
(U i < relation. name > , < operation. 
name>). When the database designer 
later calls the cooperations to define 
s^CCTOo n s t thercaM)waytoe3tttodeor 
forget the condition on the relation 
authorized in any of the update dep 
oVncksccnstuutina the s-operations. 



their intralevel propagation. We now 
study the interievd propagation of a 
schema update, that is, the effect a schema 
update has on the data defined and con- 
trolled by the schema. 



.We have already 
given a specification of ojptratkKM defin- 
ing and controlling 



Consider Figure 22. If we 
want to delete the tuple (person j I ) from 
rem, we make the operation call dd_de- 
kte(rem<person t rl)). This operation only 
affects tuples in the database defining the 
schema. We would somehow tike aD the 
tuples in the relation person to be deleted. 



and There are three ways in which we can try 



to handle the problem of interievd propa- 
gation: They are all needed. We can 

• include a set of rates for interievd 
propagation in the semantic definition of 
thcDL; 

• specify mterievd propagation dtectly 
in the operation specifications; or 

• provide a data management tool for 
database reorganization. . 

inducting interievd propagation in the 
■ semantics of the DL. Only general rules 
for interievd propagation for our data 
model should be included in the semantic 
detmmonoftheDL. 

Before we define these rules, let us il- 
lustrate one of the pitfalls of the problem 
of interievd propagation by considering 
the defeoon of an attribute from a relation 
with a nonempty extension. If we think 
:<iat the general rule for interievd propa- 
gation in the relational model in this 
situation is enforced by projecting the ex- 
tension of the relation over the remaining 
attributes, .thai we are in for a surprise. 
The problem is not that we don't know 
which duplicate tuples, if any, to get rid of. 
The problem is that the relational projec- 
tion operator has nothing to do with the 
. process of deleting an attribute from a 
relation definition. In some situations we 
may decide that when we ddete an attribute 
from a relation definition, the extension of 
' the new relation should be computed by 
projection, but this is not a general rule for 
the relational modd. , t * . 

The reason for our surprise is that we , 
have gotten used to the production of an 
artificial intension of a relation every time 
we use the relational operators on some 
relation extensions. What we must realize 
is that the relational operators have no in- 
tensional semantics. That is, the intension 
produced by the relational operators has 
no meaning; it must be assigned by 
humans. 12 

We know of only a few general rules for 
interievd propagation in the relational 
data modd: 

(1) Ardatmndefrnitioncanbeddetcdif 
the extension of the relation is currently 
empty. This rule is implemented in 
Chamberim, ,3 andZloff." 

(2) A rdiifJor. ;k fsntion can be inserted. 
The extension oi the relation will defined 
to be empty. This rule is implemented in 
Chamberta^andZtoff. 14 

(3) An attribute definition can be 
deleted from a relation definition if the ex- 
tension of the relation is currently emptv. 
This rule is unpJemented in Chamberiin ' 3 
andZwiT. 14 

(4) An attribute definition can be in- 
serted in a relation definition if the exten- 
sion ofme relation is cunertffy empty. Ttiis 
rutebirmjarnxmedinZtoff. 14 It is gener- 
alized in Chamberiin, 13 where an attri- 
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bute definition can be inserted in a relation 
with a nonempty extension. The corre 
spending values in (he extension of the 
relation are defined to be null, represent- 
ing value unknown or value inapplicable. 
. . . (5) A domain definition can be deleted if 
it is not part .of any relation definition. A 
a domain definition can be deleted from a ; 
•set of relation definitions if all the at- 
tributes defined over the domain can be 
deleted from the relation definitions. The ! 
v mierievel propagation is through the dele- 
tion of the attribute definitions, j 
.«« (6) A domain definition can be inserted 
in a relation definition if the implied at- 
tribute definition can be inserted in the 
: relation definition. The interlevel propa- . 

* gation is through the insertion of the at-. 

♦ tribute definition. 

' (7) A view definition can be deleted 
without any interlevel propagation. This 
rule is implemented in Chamber lin. 13 

(8) A view definition can be inserted . 
without any interlevel propagation. This 
ruje is implemented in Chamberlin. 13 . , 
We include t he general rules ( 1 ) through 
(8) in the semantics of.*he DL. This means ■ 
that when we insert a base relation defini- 
tion, the system will create an empty table 
to hold the extension of (he relation, and . 
when* we delete a base relation definition, 

v ;the "system will remove the empty table. 
' All of the above rules for interlevel 
propagation, except for the generalization 
of rule (4), actually cover situations where 
there are no interlevel propagation, except 
for the empty tables that are set up or 

'removed. 

To help the database administrator and 

| the database designer bring the database 
into a state that allows a schema update, 
we can let the non-level-specific opera- 
dons in the metaschema give us a couple 
of hints, as illustrated in the following 
example. 

Example. In order to enforce a gener- 
alization of rule (1), to allow the deletion 
of a relation with a nonempty extension, 
we could change the definition of the 
delete operation on relation rein as il- 
lustrated in Figure 23. 

The condition in (he operation checks 
that there exists a delete operation for the 
level in question. The relations op_spec 
anr opr. are two of the relations in the 
expanded metaschema controlling the def- 
inition and change of operation specifica- 
tions. The operation s : mply tells the user 
which interlevel propagation must be 
taken care of before passing control back 
to the operation. 

General rules for interlevel propagation 
are more interesting in a data model that 
supports the notion of subtype or is-a rela- 
tionships. We have defined a set of rules 
elsewhere. 15 



ddJielete(rem(N,R)) 



- "reln(N,R) a op^pec(O f R) * opn(pAjdelete) ^(N^rsonjiame),: 

ajdeleteJr^raonjjameLJ), , 
v retrft^reln^fiwnjiarrm^)), ' «j; -.y ■ 

'■; ddjdelete(»latibn(R)), - / * ; '. ; ;? f 



Figure 24. Specifying interlevel propagation in an operation. 



Specifying interlevel propagation in (he 
operations. Data-dependent rules for in- 
terlevel propagation can be explicitly 
modeled in the operations. If the rules are 
general 'for the data model,' then they 
should be included in the non-level- 
specific operations in the metaschema. If 
. the rules apply to a specific application of 
the data model, then they should 'be in-v; 
eluded in the level-specific operations. • 

Example. Suppose /we define two rela- 
. lions, persbnL'nam^ per- 
son _address(p#,;addr^^^^^ with com- \ 
. pound update operations^ en forcing, a - L 
referential integrity constraint from per- 
sonL address (o person_name. This means ;. 
that when we insert the tuple (pi, address) 
in person_address, then a tuple (pi, name) 
must be present in or inserted into 'per- ' 
son .name, and vice versa for deletion. 

Suppose we want to generalize this rule 
to the relation definitions themselves, 
meaning that if we delete the relation per- 
son, name, then we want to delete the 
definition of the relation person .address, 
too: 

We can specify this rule in the data dic- 
tionary schema as shown in Figure 24. 

The condition of the operation checks 
that there exists an operation with the 
name s_delete for the relation to be 
deleted. The relations op_spec and opn 
are two of the relations in the expanded 
metaschema used to model and control the 
definition of operations. The operation 
simply enforces the data-dependent rule 
that if the relation definition to be deleted 
is for the relation person_name, then all 
tuples in the extension of this relation must 
first be deleted. 

This technique works very well on data- 
dependent rules for interlevel propaga- 
tion. The technique can be used both in 
level-specific operations in the meta- 
schema and the data dictionary schema. 

Database reorganization. When in- 
terlevel propagation cannot be included in 



: the semantics of the DL or explicitly 
specified in the operations, then we must 
resort to tools for database reorganiza- 
tion. This. means that the database ad- 
ministrator or the database designer will 
be responsible for the interlevel consisten- 
cy of intensions and extensions. 
- ; " ' . Database reorganization is often a very 
> elaborate process involving a system shut- 
down. Only a few systems, : including 
. SystemR 13 and QBE, M support on-line 
^database reorganization.; A.-powerful 
algebra for .database reorganization was 
^proposed for the extended relational 
model RM/T. I6 v , ; . .iv.. 

On-line database reorganization is sup- 
ported by a self-describing database sys- 
tem.- The general technique follows: 

( 1 ) Insert the definition of the new set of " 
' relations. . ' 

(2) Insert the definition of the com- 
pound update operations for the new 
relations. 

(3) Write a dai« management tool that 
uses the delete operations en the old rela- 
tions and the insert operations on the new 
relations to move and call the data. 

(4) Delete the definitions of the old 
relations. 

(5) Rename the new relations. 

This completes our discussion of the 
intension-extension dimension. We have 
described its initial contents and how to set 
it up. We have explained how to expand 
the data dictionary schema. And, we have 
discussed three ways of handling interlevel 
propagation. 



In this article we concentrated on the 
metadata management aspects of a 
self-describing database system, 
and we used the relational data model and 
the formalism for update dependencies to 
specify the conceptual level of a self- 
describing database system. 

The architecture for self-describing 
database systems has been accepted by 
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ANSI /SPARC and is being considered by 
the ISO as a reference model for database 
systems in the tate 1960*$ and 1990's. We 
will undoubtedly see several database sys- 
tems develop in this. direction. 

We are ourselves currently using a self- 
describing database system in the design of - 
a system for scientific information inter- 
change..' : 
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